A Word Image Coding Technique and its Applications in Information Retrieval from Imaged Documents

نویسندگان

  • Li Zhang
  • Chew Lim Tan
چکیده

With the need of current fast evolving digital libraries, an increasing amount of documents are being digitized into an electronic format for easy archival and dissemination purposes. Thus Document Image Retrieval (DIR), as part of information retrieval (IR) paradigm, is receiving attentions among the IR communities in recent years. This paper presents two DIR applications based on a word image coding technique to extract features from each word image object and represent them using feature code strings for comparison. The first application is a web-based retrieval system that retrieves document images online from digital libraries based on a set of input query words. The second one is a plug-in search tool embedded in Acrobat Reader that performs word spotting within the opened document images and marks the matching words explicitly in the document. Both applications achieve good precision and recall according to our experiments on document images such as students’ theses provided by our university digital library.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval of machine-printed Latin documents through Word Shape Coding

This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a word shape coding scheme is proposed, which converts each word image into a word shape code by using a few shape features. The text contents of imaged documents are thus captured by a document vector constructed with the...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Image Based Word Retrieval Method for Unrestricted Textline Direction Documents

differs, and to apply those techniques for Chinese It is very important to perform a full-text retrieval search of document information accumulated in the past. Although the retrieval technologies for ascii teM documents have been established, the highly precise character retrieval from the image based documents such as a bitmap image is not easy. In this paper, a word retrieval technique for a...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004